Takao WAHO Tomoaki KOIZUMI Hitoshi HAYASHI
A feedforward (FF) network using ΔΣ modulators is investigated to implement a non-binary analog-to-digital (A/D) converter. Weighting coefficients in the network are determined to suppress the generation of quantization noise. A moving average is adopted to prevent the analog signal amplitude from increasing beyond the allowable input range of the modulators. The noise transfer function is derived and used to estimate the signal-to-noise ratio (SNR). The FF network output is a non-uniformly distributed multi-level signal, which results in a better SNR than a uniformly distributed one. Also, the effect of the characteristic mismatch in analog components on the SNR is analyzed. Our behavioral simulations show that the SNR is improved by more than 30 dB, or equivalently a bit resolution of 5 bits, compared with a conventional first-order ΔΣ modulator.
A neural network that outputs reconstructed images based on projection data containing scattered X-rays is presented, and the proposed scheme exhibits better accuracy than conventional computed tomography (CT), in which the scatter information is removed. In medical X-ray CT, it is a common practice to remove scattered X-rays using a collimator placed in front of the detector. In this study, the scattered X-rays were assumed to have useful information, and a method was devised to utilize this information effectively using a neural network. Therefore, we generated 70,000 projection data by Monte Carlo simulations using a cube comprising 216 (6 × 6 × 6) smaller cubes having random density parameters as the target object. For each projection simulation, the densities of the smaller cubes were reset to different values, and detectors were deployed around the target object to capture the scattered X-rays from all directions. Then, a neural network was trained using these projection data to output the densities of the smaller cubes. We confirmed through numerical evaluations that the neural-network approach that utilized scattered X-rays reconstructed images with higher accuracy than did the conventional method, in which the scattered X-rays were removed. The results of this study suggest that utilizing the scattered X-ray information can help significantly reduce patient dosing during imaging.
Pengtao JIA Qi ZHAO Boze LI Jing ZHANG
Gait recognition distinguishes one individual from others according to the natural patterns of human gaits. Gait recognition is a challenging signal processing technology for biometric identification due to the ambiguity of contours and the complex feature extraction procedure. In this work, we proposed a new model - the convolutional neural network (CNN) joint attention mechanism (CJAM) - to classify the gait sequences and conduct person identification using the CASIA-A and CASIA-B gait datasets. The CNN model has the ability to extract gait features, and the attention mechanism continuously focuses on the most discriminative area to achieve person identification. We present a comprehensive transformation from gait image preprocessing to final identification. The results from 12 experiments show that the new attention model leads to a lower error rate than others. The CJAM model improved the 3D-CNN, CNN-LSTM (long short-term memory), and the simple CNN by 8.44%, 2.94% and 1.45%, respectively.
Xiongfei SHAN Mingyang PAN Depeng ZHAO Deqiang WANG Feng-Jang HWANG Chi-Hua CHEN
During the detection of maritime targets, the jitter of the shipborne camera usually causes the video instability and the false or missed detection of targets. Aimed at tackling this problem, a novel algorithm for maritime target detection based on the electronic image stabilization technology is proposed in this study. The algorithm mainly includes three models, namely the points line model (PLM), the points classification model (PCM), and the image classification model (ICM). The feature points (FPs) are firstly classified by the PLM, and stable videos as well as target contours are obtained by the PCM. Then the smallest bounding rectangles of the target contours generated as the candidate bounding boxes (bboxes) are sent to the ICM for classification. In the experiments, the ICM, which is constructed based on the convolutional neural network (CNN), is trained and its effectiveness is verified. Our experimental results demonstrate that the proposed algorithm outperformed the benchmark models in all the common metrics including the mean square error (MSE), peak signal to noise ratio (PSNR), structural similarity index (SSIM), and mean average precision (mAP) by at least -47.87%, 8.66%, 6.94%, and 5.75%, respectively. The proposed algorithm is superior to the state-of-the-art techniques in both the image stabilization and target ship detection, which provides reliable technical support for the visual development of unmanned ships.
Akira KITAYAMA Goichi ONO Tadashi KISHIMOTO Hiroaki ITO Naohiro KOHMU
Reducing power consumption is crucial for edge devices using convolutional neural network (CNN). The zero-skipping approach for CNNs is a processing technique widely known for its relatively low power consumption and high speed. This approach stops multiplication and accumulation (MAC) when the multiplication results of the input data and weight are zero. However, this technique requires large logic circuits with around 5% overhead, and the average rate of MAC stopping is approximately 30%. In this paper, we propose a precise zero-skipping method that uses input data and simple logic circuits to stop multipliers and accumulators precisely. We also propose an active data-skipping method to further reduce power consumption by slightly degrading recognition accuracy. In this method, each multiplier and accumulator are stopped by using small values (e.g., 1, 2) as input. We implemented single shot multi-box detector 500 (SSD500) network model on a Xilinx ZU9 and applied our proposed techniques. We verified that operations were stopped at a rate of 49.1%, recognition accuracy was degraded by 0.29%, power consumption was reduced from 9.2 to 4.4 W (-52.3%), and circuit overhead was reduced from 5.1 to 2.7% (-45.9%). The proposed techniques were determined to be effective for lowering the power consumption of CNN-based edge devices such as FPGA.
Takaaki SAEKI Yuki SAITO Shinnosuke TAKAMICHI Hiroshi SARUWATARI
This paper proposes two high-fidelity and computationally efficient neural voice conversion (VC) methods based on a direct waveform modification using spectral differentials. The conventional spectral-differential VC method with a minimum-phase filter achieves high-quality conversion for narrow-band (16 kHz-sampled) VC but requires heavy computational cost in filtering. This is because the minimum phase obtained using a fixed lifter of the Hilbert transform often results in a long-tap filter. Furthermore, when we extend the method to full-band (48 kHz-sampled) VC, the computational cost is heavy due to increased sampling points, and the converted-speech quality degrades due to large fluctuations in the high-frequency band. To construct a short-tap filter, we propose a lifter-training method for data-driven phase reconstruction that trains a lifter of the Hilbert transform by taking into account filter truncation. We also propose a frequency-band-wise modeling method based on sub-band multi-rate signal processing (sub-band modeling method) for full-band VC. It enhances the computational efficiency by reducing sampling points of signals converted with filtering and improves converted-speech quality by modeling only the low-frequency band. We conducted several objective and subjective evaluations to investigate the effectiveness of the proposed methods through implementation of the real-time, online, full-band VC system we developed, which is based on the proposed methods. The results indicate that 1) the proposed lifter-training method for narrow-band VC can shorten the tap length to 1/16 without degrading the converted-speech quality, and 2) the proposed sub-band modeling method for full-band VC can improve the converted-speech quality while reducing the computational cost, and 3) our real-time, online, full-band VC system can convert 48 kHz-sampled speech in real time attaining the converted speech with a 3.6 out of 5.0 mean opinion score of naturalness.
Thi Diem TRAN Yasuhiko NAKASHIMA
Convolutional neural networks (CNNs) have dominated a range of applications, from advanced manufacturing to autonomous cars. For energy cost-efficiency, developing low-power hardware for CNNs is a research trend. Due to the large input size, the first few convolutional layers generally consume most latency and hardware resources on hardware design. To address these challenges, this paper proposes an innovative architecture named SLIT to extract feature maps and reconstruct the first few layers on CNNs. In this reconstruction approach, total multiply-accumulate operations are eliminated on the first layers. We evaluate new topology with MNIST, CIFAR, SVHN, and ImageNet datasets on image classification application. Latency and hardware resources of the inference step are evaluated on the chip ZC7Z020-1CLG484C FPGA with Lenet-5 and VGG schemes. On the Lenet-5 scheme, our architecture reduces 39% of latency and 70% of hardware resources with a 0.456 W power consumption compared to previous works. Even though the VGG models perform with a 10% reduction in hardware resources and latency, we hope our overall results will potentially give a new impetus for future studies to reach a higher optimization on hardware design. Notably, the SLIT architecture efficiently merges with most popular CNNs at a slightly sacrificing accuracy of a factor of 0.27% on MNIST, ranging from 0.5% to 1.5% on CIFAR, approximately 2.2% on ImageNet, and remaining the same on SVHN databases.
This letter presents an efficient technique to reduce the computational complexity involved in training binary convolutional neural networks (BCNN). The BCNN training shall be conducted focusing on the optimization of the sign of each weight element rather than the exact value itself in convention; in which, the sign of an element is not likely to be flipped anymore after it has been updated to have such a large magnitude to be clipped out. The proposed technique does not update such elements that have been clipped out and eliminates the computations involved in their optimization accordingly. The complexity reduction by the proposed technique is as high as 25.52% in training the BCNN model for the CIFAR-10 classification task, while the accuracy is maintained without severe degradation.
Shakhnaz AKHMEDOVA Vladimir STANOVOV Sophia VISHNEVSKAYA Chiori MIYAJIMA Yukihiro KAMIYA
This study is focused on the automated detection of a complex system operator's condition. For example, in this study a person's reaction while listening to music (or not listening at all) was determined. For this purpose various well-known data mining tools as well as ones developed by authors were used. To be more specific, the following techniques were developed and applied for the mentioned problems: artificial neural networks and fuzzy rule-based classifiers. The neural networks were generated by two modifications of the Differential Evolution algorithm based on the NSGA and MOEA/D schemes, proposed for solving multi-objective optimization problems. Fuzzy logic systems were generated by the population-based algorithm called Co-Operation of Biology Related Algorithms or COBRA. However, firstly each person's state was monitored. Thus, databases for problems described in this study were obtained by using non-contact Doppler sensors. Experimental results demonstrated that automatically generated neural networks and fuzzy rule-based classifiers can properly determine the human condition and reaction. Besides, proposed approaches outperformed alternative data mining tools. However, it was established that fuzzy rule-based classifiers are more accurate and interpretable than neural networks. Thus, they can be used for solving more complex problems related to the automated detection of an operator's condition.
Hiroaki KUDO Tetsuya MATSUMOTO Kentaro KUTSUKAKE Noritaka USAMI
In this paper, we evaluate a prediction method of regions including dislocation clusters which are crystallographic defects in a photoluminescence (PL) image of multicrystalline silicon wafers. We applied a method of a transfer learning of the convolutional neural network to solve this task. For an input of a sub-region image of a whole PL image, the network outputs the dislocation cluster regions are included in the upper wafer image or not. A network learned using image in lower wafers of the bottom of dislocation clusters as positive examples. We experimented under three conditions as negative examples; image of some depth wafer, randomly selected images, and both images. We examined performances of accuracies and Youden's J statistics under 2 cases; predictions of occurrences of dislocation clusters at 10 upper wafer or 20 upper wafer. Results present that values of accuracies and values of Youden's J are not so high, but they are higher results than ones of bag of features (visual words) method. For our purpose to find occurrences dislocation clusters in upper wafers from the input wafer, we obtained results that randomly select condition as negative examples is appropriate for 10 upper wafers prediction, since its results are better than other negative examples conditions, consistently.
Motohiro TAKAGI Kazuya HAYASE Masaki KITAHARA Jun SHIMAMURA
This paper proposes a change detection method for buildings based on convolutional neural networks. The proposed method detects building changes from pairs of optical aerial images and past map information concerning buildings. Using high-resolution image pair and past map information seamlessly, the proposed method can capture the building areas more precisely compared to a conventional method. Our experimental results show that the proposed method outperforms the conventional change detection method that uses optical aerial images to detect building changes.
Leilei KONG Yong HAN Haoliang QI Zhongyuan HAN
Source retrieval is the primary task of plagiarism detection. It searches the documents that may be the sources of plagiarism to a suspicious document. The state-of-the-art approaches usually rely on the classical information retrieval models, such as the probability model or vector space model, to get the plagiarism sources. However, the goal of source retrieval is to obtain the source documents that contain the plagiarism parts of the suspicious document, rather than to rank the documents relevant to the whole suspicious document. To model the “partial matching” between documents, this paper proposes a Partial Matching Convolution Neural Network (PMCNN) for source retrieval. In detail, PMCNN exploits a sequential convolution neural network to extract the plagiarism patterns of contiguous text segments. The experimental results on PAN 2013 and PAN 2014 plagiarism source retrieval corpus show that PMCNN boosts the performance of source retrieval significantly, outperforming other state-of-the-art document models.
Hao XIAO Kaikai ZHAO Guangzhu LIU
This work presents a DNN accelerator architecture specifically designed for performing efficient inference on compressed and sparse DNN models. Leveraging the data sparsity, a runtime processing scheme is proposed to deal with the encoded weights and activations directly in the compressed domain without decompressing. Furthermore, a new data flow is proposed to facilitate the reusage of input activations across the fully-connected (FC) layers. The proposed design is implemented and verified using the Xilinx Virtex-7 FPGA. Experimental results show it achieves 1.99×, 1.95× faster and 20.38×, 3.04× more energy efficient than CPU and mGPU platforms, respectively, running AlexNet.
Wang BO Zhang B. FANG Liu X. WEI Zou F. CHENG Zhang X. HUA
In this paper, the issue of malicious URL detection is investigated. Firstly a P system is proposed. Then the new P system is introduced to design the optimization algorithm of BP neural network to achieve the malicious URL detection with better performance. In the end some examples are included and corresponding experimental results display the advantage and effectiveness of the optimization algorithm proposed.
Masayuki ODAGAWA Takumi OKAMOTO Tetsushi KOIDE Toru TAMAKI Bisser RAYTCHEV Kazufumi KANEDA Shigeto YOSHIDA Hiroshi MIENO Shinji TANAKA Takayuki SUGAWARA Hiroshi TOISHI Masayuki TSUJI Nobuo TAMBA
In this paper, we present a hardware implementation of a colorectal cancer diagnosis support system using a colorectal endoscopic video image on customizable embedded DSP. In an endoscopic video image, color shift, blurring or reflection of light occurs in a lesion area, which affects the discrimination result by a computer. Therefore, in order to identify lesions with high robustness and stable classification to these images specific to video frame, we implement a computer-aided diagnosis (CAD) system for colorectal endoscopic images with Narrow Band Imaging (NBI) magnification with the Convolutional Neural Network (CNN) feature and Support Vector Machine (SVM) classification. Since CNN and SVM need to perform many multiplication and accumulation (MAC) operations, we implement the proposed hardware system on a customizable embedded DSP, which can realize at high speed MAC operations and parallel processing with Very Long Instruction Word (VLIW). Before implementing to the customizable embedded DSP, we profile and analyze processing cycles of the CAD system and optimize the bottlenecks. We show the effectiveness of the real-time diagnosis support system on the embedded system for endoscopic video images. The prototyped system demonstrated real-time processing on video frame rate (over 30fps @ 200MHz) and more than 90% accuracy.
Lin YAN Mingyong ZENG Shuai REN Zhangkai LUO
Encrypted traffic identification is to predict traffic types of encrypted traffic. A deep residual convolution network is proposed for this task. The Softmax classifier is fused with its angular variant, which sets an angular margin to achieve better discrimination. The proposed method improves representation learning and reaches excellent results on the public dataset.
Mikio HASEGAWA Hirotake ITO Hiroki TAKESUE Kazuyuki AIHARA
Recently, new optimization machines based on non-silicon physical systems, such as quantum annealing machines, have been developed, and their commercialization has been started. These machines solve the problems by searching the state of the Ising spins, which minimizes the Ising Hamiltonian. Such a property of minimization of the Ising Hamiltonian can be applied to various combinatorial optimization problems. In this paper, we introduce the coherent Ising machine (CIM), which can solve the problems in a milli-second order, and has higher performance than the quantum annealing machines especially on the problems with dense mutual connections in the corresponding Ising model. We explain how a target problem can be implemented on the CIM, based on the optimization scheme using the mutually connected neural networks. We apply the CIM to traveling salesman problems as an example benchmark, and show experimental results of the real machine of the CIM. We also apply the CIM to several combinatorial optimization problems in wireless communication systems, such as channel assignment problems. The CIM's ultra-fast optimization may enable a real-time optimization of various communication systems even in a dynamic communication environment.
Noriyuki TONAMI Keisuke IMOTO Ryosuke YAMANISHI Yoichi YAMASHITA
Sound event detection (SED) and acoustic scene classification (ASC) are important research topics in environmental sound analysis. Many research groups have addressed SED and ASC using neural-network-based methods, such as the convolutional neural network (CNN), recurrent neural network (RNN), and convolutional recurrent neural network (CRNN). The conventional methods address SED and ASC separately even though sound events and acoustic scenes are closely related to each other. For example, in the acoustic scene “office,” the sound events “mouse clicking” and “keyboard typing” are likely to occur. Therefore, it is expected that information on sound events and acoustic scenes will be of mutual aid for SED and ASC. In this paper, we propose multitask learning for joint analysis of sound events and acoustic scenes, in which the parts of the networks holding information on sound events and acoustic scenes in common are shared. Experimental results obtained using the TUT Sound Events 2016/2017 and TUT Acoustic Scenes 2016 datasets indicate that the proposed method improves the performance of SED and ASC by 1.31 and 1.80 percentage points in terms of the F-score, respectively, compared with the conventional CRNN-based method.
Zhenhui XU Tielong SHEN Daizhan CHENG
This paper studies the infinite time horizon optimal control problem for continuous-time nonlinear systems. A completely model-free approximate optimal control design method is proposed, which only makes use of the real-time measured data from trajectories instead of a dynamical model of the system. This approach is based on the actor-critic structure, where the weights of the critic neural network and the actor neural network are updated sequentially by the method of weighted residuals. It should be noted that an external input is introduced to replace the input-to-state dynamics to improve the control policy. Moreover, strict proof of convergence to the optimal solution along with the stability of the closed-loop system is given. Finally, a numerical example is given to show the efficiency of the method.
Rui YIN Zhiqun ZOU Celimuge WU Jiantao YUAN Xianfu CHEN Guanding YU
The unlicensed spectrum has been utilized to make up the shortage on frequency spectrum in new radio (NR) systems. To fully exploit the advantages brought by the unlicensed bands, one of the key issues is to guarantee the fair coexistence with WiFi systems. To reach this goal, timely and accurate estimation on the WiFi traffic loads is an important prerequisite. In this paper, a machine learning (ML) based method is proposed to detect the number of WiFi users on the unlicensed bands. An unsupervised Neural Network (NN) structure is applied to filter the detected transmission collision probability on the unlicensed spectrum, which enables the NR users to precisely rectify the measurement error and estimate the number of active WiFi users. Moreover, NN is trained online and the related parameters and learning rate of NN are jointly optimized to estimate the number of WiFi users adaptively with high accuracy. Simulation results demonstrate that compared with the conventional Kalman Filter based detection mechanism, the proposed approach has lower complexity and can achieve a more stable and accurate estimation.